31 research outputs found
OReX: Object Reconstruction from Planner Cross-sections Using Neural Fields
Reconstructing 3D shapes from planar cross-sections is a challenge inspired
by downstream applications like medical imaging and geographic informatics. The
input is an in/out indicator function fully defined on a sparse collection of
planes in space, and the output is an interpolation of the indicator function
to the entire volume. Previous works addressing this sparse and ill-posed
problem either produce low quality results, or rely on additional priors such
as target topology, appearance information, or input normal directions. In this
paper, we present OReX, a method for 3D shape reconstruction from slices alone,
featuring a Neural Field as the interpolation prior. A simple neural network is
trained on the input planes to receive a 3D coordinate and return an
inside/outside estimate for the query point. This prior is powerful in inducing
smoothness and self-similarities. The main challenge for this approach is
high-frequency details, as the neural prior is overly smoothing. To alleviate
this, we offer an iterative estimation architecture and a hierarchical input
sampling scheme that encourage coarse-to-fine training, allowing focusing on
high frequencies at later stages. In addition, we identify and analyze a common
ripple-like effect stemming from the mesh extraction step. We mitigate it by
regularizing the spatial gradients of the indicator function around input
in/out boundaries, cutting the problem at the root.
Through extensive qualitative and quantitative experimentation, we
demonstrate our method is robust, accurate, and scales well with the size of
the input. We report state-of-the-art results compared to previous approaches
and recent potential solutions, and demonstrate the benefit of our individual
contributions through analysis and ablation studies
Human Motion Diffusion as a Generative Prior
Recent work has demonstrated the significant potential of denoising diffusion
models for generating human motion, including text-to-motion capabilities.
However, these methods are restricted by the paucity of annotated motion data,
a focus on single-person motions, and a lack of detailed control. In this
paper, we introduce three forms of composition based on diffusion priors:
sequential, parallel, and model composition. Using sequential composition, we
tackle the challenge of long sequence generation. We introduce DoubleTake, an
inference-time method with which we generate long animations consisting of
sequences of prompted intervals and their transitions, using a prior trained
only for short clips. Using parallel composition, we show promising steps
toward two-person generation. Beginning with two fixed priors as well as a few
two-person training examples, we learn a slim communication block, ComMDM, to
coordinate interaction between the two resulting motions. Lastly, using model
composition, we first train individual priors to complete motions that realize
a prescribed motion for a given joint. We then introduce DiffusionBlending, an
interpolation mechanism to effectively blend several such models to enable
flexible and efficient fine-grained joint and trajectory-level control and
editing. We evaluate the composition methods using an off-the-shelf motion
diffusion model, and further compare the results to dedicated models trained
for these specific tasks